
*Import and merge the three datasets used together:

use "$root/Data/Original/entrance_2019.dta", clear
merge m:1 EGID using "$root/Data/Original/building_char2019.dta"
*most of the non_merged buildings have been collapsed, so are not relevant for our data anyways. 
keep if _merge==3
drop _merge

*check for duplicates: 
duplicates tag EGID EDID, gen(dup)

*The duplicates are due to double registration of buildings in Biel (german and french adress)

*merge the apartment information: 
merge m:1 EGID EDID using "$root/Data/Original/apartment_char2019.dta"
drop if _merge==2
drop _merge

rename STRNAME Strasse
rename number HausNr


*some Adresses have several buildings
duplicates tag PLZ Strasse HausNr, gen(duplicates)

gsort -duplicates PLZ Strasse HausNr

**Here some manual data processing occurred to eliminate duplicate building - identifications (some buildings are reported multiple times after re-construction for example), some are also in the database as building projects and finalised buildings. We eliminate duplicates, after manually recoding some information based on research. We cannot share this code for replication, as it would potentially harm anonymity since the global building id is publicly available and could thus be linked to addresses. It is important to note here, that the overall data cleaning steps here were to ensure that the entirety of the sample is potentially attainable. However, the final sample is much smaller. Thus, it is relatively unlikely that we manually changed a building that is contained in the final estimation sample.





save "$root/Data/Original/bfs_merge_2019.dta", replace

